mask vector
SOLQ: Segmenting Objects by Learning Queries
In this paper, we propose an end-to-end framework for instance segmentation. Based on the recently introduced DETR, our method, termed SOLQ, segments objects by learning unified queries. In SOLQ, each query represents one object and has multiple representations: class, location and mask. The object queries learned perform classification, box regression and mask encoding simultaneously in an unified vector form. During training phase, the mask vectors encoded are supervised by the compression coding of raw spatial masks. In inference time,mask vectors produced can be directly transformed to spatial masks by the inverse process of compression coding. Experimental results show that SOLQ can achieve state-of-the-art performance, surpassing most of existing approaches. Moreover, the joint learning of unified query representation can greatly improve the detection performance of DETR. We hope our SOLQ can serve as a strong baseline for the Transformer-based instance segmentation.
Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data
Zhang, Bingjie, Li, Hongkang, Shi, Changlong, Rong, Guowei, Zhao, He, Wang, Dongsheng, Guo, Dandan, Wang, Meng
Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features, thereby improving overall performance across the tasks. Recent studies have dedicated efforts to merging multiple independent model parameters into a unified model for MTL, thus circumventing the need for training data and expanding the scope of applicable scenarios of MTL. However, current approaches to model merging predominantly concentrate on enhancing performance within in-domain (ID) datasets, often overlooking their efficacy on out-of-domain (OOD) datasets. In this work, we proposed LwPTV (Layer-wise Pruning Task Vector) by building a saliency score, measuring the redundancy of parameters in task vectors. Designed in this way ours can achieve mask vector for each task and thus perform layer-wise pruning on the task vectors, only keeping the pre-trained model parameters at the corresponding layer in merged model. Owing to its flexibility, our method can be seamlessly integrated with most of existing model merging methods to improve their performance on OOD tasks. Extensive experiments demonstrate that the application of our method results in substantial enhancements in OOD performance while preserving the ability on ID tasks.
SOLQ: Segmenting Objects by Learning Queries
In this paper, we propose an end-to-end framework for instance segmentation. Based on the recently introduced DETR, our method, termed SOLQ, segments objects by learning unified queries. In SOLQ, each query represents one object and has multiple representations: class, location and mask. The object queries learned perform classification, box regression and mask encoding simultaneously in an unified vector form. During training phase, the mask vectors encoded are supervised by the compression coding of raw spatial masks.
AMCEN: An Attention Masking-based Contrastive Event Network for Two-stage Temporal Knowledge Graph Reasoning
Yang, Jing, Wang, Xiao, Wang, Yutong, Wang, Jiawei, Wang, Fei-Yue
Temporal knowledge graphs (TKGs) can effectively model the ever-evolving nature of real-world knowledge, and their completeness and enhancement can be achieved by reasoning new events from existing ones. However, reasoning accuracy is adversely impacted due to an imbalance between new and recurring events in the datasets. To achieve more accurate TKG reasoning, we propose an attention masking-based contrastive event network (AMCEN) with local-global temporal patterns for the two-stage prediction of future events. In the network, historical and non-historical attention mask vectors are designed to control the attention bias towards historical and non-historical entities, acting as the key to alleviating the imbalance. A local-global message-passing module is proposed to comprehensively consider and capture multi-hop structural dependencies and local-global temporal evolution for the in-depth exploration of latent impact factors of different event types. A contrastive event classifier is used to classify events more accurately by incorporating local-global temporal patterns into contrastive learning. Therefore, AMCEN refines the prediction scope with the results of the contrastive event classification, followed by utilizing attention masking-based decoders to finalize the specific outcomes. The results of our experiments on four benchmark datasets highlight the superiority of AMCEN. Especially, the considerable improvements in Hits@1 prove that AMCEN can make more precise predictions about future occurrences.
Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer
Yan, Bencheng, Wang, Pengjie, Zhang, Kai, Lin, Wei, Lee, Kuang-Chih, Xu, Jian, Zheng, Bo
Embedding learning for categorical features is crucial for the deep learning-based recommendation models (DLRMs). Each feature value is mapped to an embedding vector via an embedding learning process. Conventional methods configure a fixed and uniform embedding size to all feature values from the same feature field. However, such a configuration is not only sub-optimal for embedding learning but also memory costly. Existing methods that attempt to resolve these problems, either rule-based or neural architecture search (NAS)-based, need extensive efforts on the human design or network training. They are also not flexible in embedding size selection or in warm-start-based applications. In this paper, we propose a novel and effective embedding size selection scheme. Specifically, we design an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector to mask the undesired dimensions for each embedding vector. The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs. Extensive experimental evaluations show that the proposed scheme outperforms competitive baselines on all the benchmark tasks, and is also memory-efficient, saving 60\% memory usage without compromising any performance metrics.
Drug Package Recommendation via Interaction-aware Graph Induction
Zheng, Zhi, Wang, Chao, Xu, Tong, Shen, Dazhong, Qin, Penggang, Huai, Baoxing, Liu, Tongzhu, Chen, Enhong
Recent years have witnessed the rapid accumulation of massive electronic medical records (EMRs), which highly support the intelligent medical services such as drug recommendation. However, prior arts mainly follow the traditional recommendation strategies like collaborative filtering, which usually treat individual drugs as mutually independent, while the latent interactions among drugs, e.g., synergistic or antagonistic effect, have been largely ignored. To that end, in this paper, we target at developing a new paradigm for drug package recommendation with considering the interaction effect within drugs, in which the interaction effects could be affected by patient conditions. Specifically, we first design a pre-training method based on neural collaborative filtering to get the initial embedding of patients and drugs. Then, the drug interaction graph will be initialized based on medical records and domain knowledge. Along this line, we propose a new Drug Package Recommendation (DPR) framework with two variants, respectively DPR on Weighted Graph (DPR-WG) and DPR on Attributed Graph (DPR-AG) to solve the problem, in which each the interactions will be described as signed weights or attribute vectors. In detail, a mask layer is utilized to capture the impact of patient condition, and graph neural networks (GNNs) are leveraged for the final graph induction task to embed the package. Extensive experiments on a real-world data set from a first-rate hospital demonstrate the effectiveness of our DPR framework compared with several competitive baseline methods, and further support the heuristic study for the drug package generation task with adequate performance.
OT-driven Multi-Domain Unsupervised Ultrasound Image Artifact Removal using a Single CNN
Huh, Jaeyoung, Khan, Shujaat, Ye, Jong Chul
Ultrasound imaging (US) often suffers from distinct image artifacts from various sources. Classic approaches for solving these problems are usually model-based iterative approaches that have been developed specifically for each type of artifact, which are often computationally intensive. Recently, deep learning approaches have been proposed as computationally efficient and high performance alternatives. Unfortunately, in the current deep learning approaches, a dedicated neural network should be trained with matched training data for each specific artifact type. This poses a fundamental limitation in the practical use of deep learning for US, since large number of models should be stored to deal with various US image artifacts. Inspired by the recent success of multi-domain image transfer, here we propose a novel, unsupervised, deep learning approach in which a single neural network can be used to deal with different types of US artifacts simply by changing a mask vector that switches between different target domains. Our algorithm is rigorously derived using an optimal transport (OT) theory for cascaded probability measures. Experimental results using phantom and in vivo data demonstrate that the proposed method can generate high quality image by removing distinct artifacts, which are comparable to those obtained by separately trained multiple neural networks.
Differentiable Window for Dynamic Local Attention
Nguyen, Thanh-Tung, Nguyen, Xuan-Phi, Joty, Shafiq, Li, Xiaoli
We propose Differentiable Window, a new neural module and general purpose component for dynamic window selection. While universally applicable, we demonstrate a compelling use case of utilizing Differentiable Window to improve standard attention modules by enabling more focused attentions over the input regions. We propose two variants of Differentiable Window, and integrate them within the Transformer architecture in two novel ways. We evaluate our proposed approach on a myriad of NLP tasks, including machine translation, sentiment analysis, subject-verb agreement and language modeling. Our experimental results demonstrate consistent and sizable improvements across all tasks.